Sufficient Dimensionality Reduction with Irrelevance Statistics
نویسندگان
چکیده
The problem of unsupervised dimensionality reduction of stochastic variables while preserving their most relevant characteristics is fundamental for the analysis of complex data. Unfortunately, this problem is ill defined since natural datasets inherently contain alternative underlying structures. In this paper we address this problem by extending the recently introduced “Sufficient Dimensionality Reduction” feature extraction method [7], to use “side information” about irrelevant structures in the data. The use of such irrelevance information was recently successfully demonstrated in the context of clustering via the Information Bottleneck method [1]. Here we use this side-information framework to identify continuous features whose measurements are maximally informative for the main data set, but carry as little information as possible on the irrelevance data set. In statistical terms this can be understood as extracting statistics which are maximally sufficient for the main dataset, while simultaneously maximally ancillary for the irrelevance dataset. We formulate this problem as a tradeoff optimization problem and describe its analytic and algorithmic solutions. Our method is demonstrated on a synthetic example and on a real world application of face images, showing its superiority over other methods such as Oriented Principal Component Analysis.
منابع مشابه
Sufficient Dimensionality Reduction with Irrelevant Statistics
The problem of unsupervised dimensionality reduction of stochastic variables while pre serving their most relevant characteristics is fundamental for the analysis of complex data. Unfortunately, this problem is ill defined since natural datasets inherently contain al ternative underlying structures. In this paper we address this problem by extending the re cently introduced "Sufficient Dimen...
متن کاملLocal Kernel Dimension Reduction in Approximate Bayesian Computation
Approximate Bayesian Computation (ABC) is a popular sampling method in applications involving intractable likelihood functions. Without evaluating the likelihood function, ABC approximates the posterior distribution by the set of accepted samples which are simulated with parameters drown from the prior distribution, where acceptance is determined by distance between the summary statistics of th...
متن کاملOn sufficient dimension reduction for proportional censorship model with covariates
The requirement of constant censoring parameter β in Koziol–Green (KG) model is too restrictive. When covariates are present, the conditional KG model (Veraverbekea and Cadarso-Suárez, 2000) which allows β to be dependent on the covariates is more realistic. In this paper, using sufficient dimension reduction methods, we provide a model-free diagnostic tool to test if β is a function of the cov...
متن کاملCanonical kernel dimension reduction
A new kernel dimension reduction (KDR) method based on the gradient space of canonical functions is proposed for sufficient dimension reduction (SDR). Similar to existing KDR methods, this new method achieves SDR for arbitrary distributions, but with more flexibility and improved computational efficiency. The choice of loss function in cross-validation is discussed, and a two-stage screening pr...
متن کاملSufficient Dimension Reduction Summaries
Observational studies assessing causal or non-causal relationships between an explanatory measure and an outcome can be complicated by hosts of confounding measures. Large numbers of confounders can lead to several biases in conventional regression based estimation. Inference is more easily conducted if we reduce the number of confounders to a more manageable number. We discuss use of sufficien...
متن کامل